Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches

نویسندگان

چکیده

Abstract We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity the classification scheme used possibility that multiple categories are assigned to a document make this task challenging. To collect training data, we query three discipline specific thesauri retrieve corresponding specialties in classification. resulting data set consists 113,909 records covers 245 specialties, aggregated into 31 subdisciplines from disciplines. Experts were consulted validate thesauri-based multilabel is train algorithms different configurations. deploy classifier chaining model, allowing for an arbitrary number be each document. best results obtained with Boosting. approach does not rely on citation It can applied settings where such information available. conclude fine-grained text-based sciences publications at subdisciplinary hard task, humans machines alike. A combination human expertise suggested as way forward improve documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

a comparison of linguistic and pragmatic knowledge: a case of iranian learners of english

در این تحقیق دانش زبانشناسی و کاربردشناسی زبان آموزان ایرانی در سطح بالای متوسط مقایسه شد. 50 دانش آموز با سابقه آموزشی مشابه از شش آموزشگاه زبان مختلف در دو آزمون دانش زبانشناسی و آزمون دانش گفتار شناسی زبان انگلیسی شرکت کردند که سوالات هر دو تست توسط محقق تهیه شده بود. همچنین در این تحقیق کارایی کتابهای آموزشی زبان در فراهم آوردن درون داد کافی برای زبان آموزان ایرانی به عنوان هدف جانبی تحقیق ...

15 صفحه اول

Comparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images

Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...

متن کامل

Supervised Machine Learning Approaches: a Survey

One of the core objectives of machine learning is to instruct computers to use data or past experience to solve a given problem. A good number of successful applications of machine learning exist already, including classifier to be trained on email messages to learn in order to distinguish between spam and non-spam messages, systems that analyze past sales data to predict customer buying behavi...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Fine-Grained Genre Classification Using Structural Learning Algorithms

Prior use of machine learning in genre classification used a list of labels as classification categories. However, genre classes are often organised into hierarchies, e.g., covering the subgenres of fiction. In this paper we present a method of using the hierarchy of labels to improve the classification accuracy. As a testbed for this approach we use the Brown Corpus as well as a range of other...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Quantitative science studies

سال: 2021

ISSN: ['2641-3337']

DOI: https://doi.org/10.1162/qss_a_00106